Forgot about Stemming, that shit sucks

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
stemmer = PorterStemmer()

# Before you can stem the words in that string, you need to separate all the words in it:
tokens: list = word_tokenize("HELLO JOHN DOE THERE YOU GO")

stemmed_words = [stemmer.stem(word) for word in words]

^-- Porterstemmer sucks

from nltk.stem.snowball import SnowballStemmer

This sucks too, basically there are false positives like because turning into becaus

Created: 2024-03-06